---
title: "External Validity: Framework, Design and Analysis"
subtitle: "ReadME and Codebook"
author: "Naoki Egami and Erin Hartman"
date: \today
output:
  html_document:
    df_print: paged
---


```{r setup, include=FALSE}
rm(list = ls())
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)
```

```{r load_libraries, include = FALSE}
library(grf)
library(foreign)
library(lmtest)
library(sandwich)
library(survey)
library(MASS)
library(parallel)
library(estimatr)
library(bartCause)
library(tidyverse)
library(knitr)
library(kableExtra)
library(dataverse)
nboot = 1000


library(evalid)
```

# Replication Instructions

The code in this collection can be used to replicate results for "External Validity: Framework, Design, and Analysis".  The code should be run in the following order:

1. `01_recode_data.Rmd` : This file will download the original data for our empirical analyses from the original replication archives, where available. This includes:

- The Broockman and Kalla (2016) data from the replication archive (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WKR39N).  This data is then recoded based on the original authors' replication file.
- The 2016 CCES data from the replication archive (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910/DVN/GDF6Z0), with the variables used in our analysis recoded to match the coding for Broockman and Kalla (2016).  
- Results for the Broockman and Kalla (2016) and the (2016) CCES data will be saved in a file called `recoded_broockman_kalla_data.RData`.
- The Bisgaard (2019) data from the replication archive (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FTFJTV).  This will be saved in a files called `study1.RData`, `study2.RData`, `study3.RData`, `study4.RData`.
- The Young (2019) data from the replication archive (https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/UNNCTR/VWFHUP). This will be saved in a file called `young_2019.RData`.

2. `02_Main_Manuscript_Analyses.Rmd` : This file will conduct the analyses in the main manuscript, including the replication for Figure 7 and Figure 8.  It will generate PDF files of the figures and latex files containing the numerical results found in the online appendix.

3. `03_Supplementary_Materials_1.Rmd` : This file will conduct the analyses in the supplementary materials, including the replication for Figures A1 - A6.  It will generate PDF files of the figures and latex files containing the numerical results found in the online appendix.

4. `04_Supplementary_Materials_2_Simulations.Rmd` : This compiles the simulation analyses found in the online appendix, including the replication for Figures A8 and A9.  It will generate PDF files of the figures and latex files containing numerical results found in the online appendix.  In order to make runtimes tractable, we split each of the simulations into component files.  To run the simulations, users should run the `04_simulation_run_all.R` file.

All generated results will be saved to a folder called `generated`.  We have uploaded the files generated by the authors in this folder, but re-running code will overwrite these results.

5. `Egami_Hartman_External_Appendix_2_apsr.pdf` contains the online supplementary materials.

# Codebooks

We rely on many replication datasets for our empirical analyses.  Where available, the replication code directly downloads the data from the replication archive.  We provide links to the original replication archive, below, where the original codebook can be found.  

We provide 4 original datasets in the `uploaded_data` folder.  We provide codebooks for the datasets we have provided below.

## Replication Archive Datasets

- Broockman and Kalla (2016): https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WKR39N
- Ansolabehere and Schaffner (2017) (the 2016 CCES): https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910/DVN/GDF6Z0
- Bisgaard (2019): https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FTFJTV
- Young (2010): https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/UNNCTR/VWFHUP

## Original Dataset Codebooks

### Bisbee et al. (2017)

File: `./uploaded_data/Bisbee_etal_TableA1 - IV Paper (Published).csv`

Citation: Bisbee, James, Rajeev Dehejia, Cristian Pop-Eleches and Cyrus Samii. 2017. “Local Instruments, Global Extrapolation: External Validity of the Labor Supply–Fertility Local Average Treatment Effect.” Journal of Labor Economics 35(S1):S99–S147.

This file contains the results presented in Table A1 of the original publication.
  
- `Country`: Country
- `Year`: Year
- `GDBpc`: GDP per capita
- `LFP`: Female labor force participation
- `TFR`: Total fertility rate (children per mother)
- `Sex_Ratio`: Average of sex ratio, the number of male children divided by the number of female children minus 0.5
- `Sex_Ratio_SE`: SE for sex ratio
- `Education`: Average of mother's educational attainment (1 = illiterate, 2 = primary, 3 = secondary, and 4 = college or higher)
- `Education_SE`: SE for education
- `Age`: Mother's age at the birth of first child
- `Age_SE`: SE for age
- `FS`: First stage coefficient of instrument on having treatment
- `FS_SE`: SE for first stage coefficient
- `IV`: LATE estimate of effect of more than two children on economically active
- `IV_SE`: SE for IV estimate
- `Match_NBER`: Does the row match the results in a previously released NBER draft of the paper
- `NOTE`: Noted differences with NBER results
  
### 2020 World Bank Classification Codes

File: `./uploaded_data/Dehejia_etal_TableA1_country_codings.csv`

This file contains the World Bank classification codes, based on 2020 cut-offs for GNI.  Historical files can be found at: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups.
  
- `Economy`: Economy / Country -- Territory at which authorities report economic and social statistics
- `Code`: Economy / Country code
- `Region`: World Bank geographic region
- `Income group`: Income grouping based on GNI (gross national income) per capita, using 2020 classification buckets
- `Lending category`: World Bank lending cateogories.  From the World Bank: "International Development Association (IDA) countries are those with low per capita incomes that lack the financial ability to borrow from the International Bank for Reconstruction and Development (IBRD). Blend countries are eligible for IDA loans but are also eligible for IBRD loans because they are financially creditworthy."
- `Other`: EMU (Economic and Monetary Union) membership and HIPC (Heavily Indebted Poor Countries) classification
  
### Dehejia et al. (2021)

File: `./uploaded_data/Dehejia_etal_TableA1.csv`

Citation: Rajeev Dehejia, Cristian Pop-Eleches and Cyrus Samii. 2021. "From Local to Global: External Validity in a Fertility Natural Experiment", Journal of Business & Economic Statistics, 39(1):217-243.

Published appendix available at: https://www.tandfonline.com/doi/suppl/10.1080/07350015.2019.1639407/suppl_file/ubes_a_1639407_sm8806.pdf

This file contains the results presented in Table A1 of the original publication.

- `Country`: Country
- `Year of Census`: Year of census
- `Estimate for More Kids`: Point estimate for effect on having more kids
- `SE for More Kids`: SE estimate for effect on having more kids
- `Est Economically Active`: Point estimate for effect on woman is economically active
- `SE Economically Active`: SE for effect on woman is economically active
  
### Dunning et al. (2019)

File: `./uploaded_data/MetaKeta 1 results - results.csv`

Citation: Dunning, Thad, et al. 2019. "Voter information campaigns and political accountability: Cumulative findings from a preregistered meta-analysis of coordinated trials." Science advances 5(7):eaaw2612.

This file contains data from the Shiny app associated with the publication, available at: https://egap.shinyapps.io/metaketa shiny/. We collect point estimates and clustered standard errors for each country using the following settings: we do not include covariate controls; we exclude non-contested elections in the Uganda 2 study (default); we include both LCV chairs and councilors in the Uganda 2 study (default); we weight each study equally (default). We focus on the estimates for the primary outcomes "vote for incumbent" and "voter turnout", but also include "Effort", "Dishonesty" and "Backlash" in the dataset. The original study reports the effect of the treatment among two subgroups — those for whom the information provided exceeds prior beliefs on candidate performance (positive or “good news”) or falls short of their baseline beliefs (negative or “bad news”), which we collect separately.  Details can also be found in Section D.2 of the appendix: 

- `Outcome`: Outcome 
- `Treatment`: Treatment 
- `Country`: Country
- `Estimate`: Point estimate from meta-analysis
- `SE`: Associated standard error for point estimate
- `t`: Associated $t$-statistic
- `p`: Associated $p$-value
- `N`: Sample size